1. Data Download and Extract
Downloaded data from GEO(GSE125449) and unzip. GSE125449-Page
Set1: discovery set, 12 patients, 5115 cells after initial quality controls
Set2: validation set, 7 patients
Features:
- Col1: gene.ids
- Col2: gene.symbol
zcat ../data/Set1.raw/features.tsv.gz | head## ENSG00000238009 RP11-34P13.7
## ENSG00000279457 FO538757.2
## ENSG00000228463 AP006222.2
## ENSG00000237094 RP4-669L17.10
## ENSG00000230021 RP5-857K21.4
## ENSG00000237491 RP11-206L10.9
## ENSG00000177757 FAM87B
## ENSG00000225880 LINC00115
## ENSG00000230368 FAM41C
## ENSG00000230699 RP11-54O7.1
Barcodes:
List of barcodes
zcat ../data/Set1.raw/barcodes.tsv.gz | head## AAACCTGAGGCGTACA-1
## AAACGGGAGATCGATA-1
## AAAGCAAAGATCGGGT-1
## AAATGCCGTCTCAACA-1
## AACACGTCACGGCTAC-1
## AACCGCGAGACGCTTT-1
## AACCGCGGTGACCAAG-1
## AACGTTGCATGCCACG-1
## AACTCAGAGGCTATCT-1
## AACTCCCAGCTCTCGG-1
Matrix:
- Col 1: Line number of Gene ID in features.tsv
- Col 2: Line number of cell info in barcodes.tsv
- Col 3: Gene TPM
zcat ../data/Set1.raw/matrix.mtx.gz | head## %%MatrixMarket matrix coordinate integer general
## 20124 5115 8517199
## 12 1 2
## 17 1 1
## 39 1 1
## 107 1 3
## 214 1 1
## 215 1 1
## 217 1 1
## 230 1 1
2. Examine Data and Setup Seurat Object
library(Seurat)
library(dplyr)
library(cowplot)
library(Cairo)2.1 Read10X
Rename GES125449_Set1/2_barcodes.tsv.gz to barcodes.tsv.gz, GES125449_Set1/2_genes.tsv.gz to features.tsv.gz, GES125449_Set1/2_matrix.mtx.gz to matrix.mtx.gz and store in separated files named Set1.raw and Set2.raw.
# Load dataset Set1
set1 <- Read10X(data.dir = "../data/Set1.raw/")
# Check the data
set1[1:10, 1:3]## 10 x 3 sparse Matrix of class "dgCMatrix"
## AAACCTGAGGCGTACA-1 AAACGGGAGATCGATA-1 AAAGCAAAGATCGGGT-1
## RP11-34P13.7 . . .
## FO538757.2 . . 1
## AP006222.2 . 1 .
## RP4-669L17.10 . . .
## RP5-857K21.4 . . .
## RP11-206L10.9 . . .
## FAM87B . . 1
## LINC00115 . . .
## FAM41C . . .
## RP11-54O7.1 . . .
2.2 Examine Data
Here we see the upper left corner of the sparse matrix. The columns are indexed by 10x cell barcodes (each 16 nt long, e.g. AAACCTGAGGCGTACA), and the rows are the gene names.
There are several zeroes here (indicated by the “.” symbol); this is the most common value in these sparse matrices. Next, look at the dimensions of this matrix.
dim(set1) # report number of genes (rows) and number of cells (columns)## [1] 20124 5115
Here we see the counts matrix has 20124 genes and 5115 cells.
Then do some plotting to look at the number of reads per cell, expressed genes per cell (often called complexity), and rarity of genes (cells expressing genes).
- Counts per cell:
counts_per_cell <- Matrix::colSums(set1)
cat("counts per cell: ", counts_per_cell[1:5], "\n") ## counts for first 5 cells## counts per cell: 2235 3555 7751 3168 2040
hist(log10(counts_per_cell+1),main='counts per cell',col='Thistle')- Genes per cell:
genes_per_cell <- Matrix::colSums(set1 > 0) # count gene only if it has non-zero reads mapped.
cat("counts for non-zero genes: ", genes_per_cell[1:5]) ## counts for first 5 genes## counts for non-zero genes: 909 1266 2495 1100 1096
hist(log10(genes_per_cell+1), main='genes per cell', col='Thistle')plot(counts_per_cell, genes_per_cell, log='xy', col='Thistle')
title('counts vs genes per cell')Here we rank each cell by its library complexity, ie the number of genes detected per cell. This is a very useful plot as it shows the distribution of library complexity in the sequencing run. One can use this plot to investigate observations (potential cells) that are actually failed libraries (lower end outliers) or observations that are cell doublets (higher end outliers).
plot(sort(genes_per_cell), xlab='cell', log='y', main='genes per cell (ordered)')2.3 Initialize Seurat Object
Then initialize the Seurat object with the raw data (non-normalized data):
min.cells = 3Keep all genes expressed in >= 3 cells (~0.1% of the data)
min.features = 200Keep all cells with at least 200 detected genes
# Initialize Seurat object
set1.obj <- CreateSeuratObject(counts = set1, project = "10X_liver_cancer", assay = "RNA")
set1.obj## An object of class Seurat
## 20124 features across 5115 samples within 1 assay
## Active assay: RNA (20124 features, 0 variable features)
rm(set1)set1.obj@assays$RNA@counts is a slot that stores the original gene count matrix. We can view the first 10 rows (genes) and the first 10 columns (cells).
set1.obj@assays$RNA@counts[1:10,1:10]## 10 x 10 sparse Matrix of class "dgCMatrix"
##
## RP11-34P13.7 . . . . . . . . . .
## FO538757.2 . . 1 . . . . . . .
## AP006222.2 . 1 . . . . . . . .
## RP4-669L17.10 . . . . . . . . . .
## RP5-857K21.4 . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . .
## FAM87B . . 1 . . . . . . .
## LINC00115 . . . . . . . . . .
## FAM41C . . . . . . . . . .
## RP11-54O7.1 . . . . . . . . . .
drop = TRUE will turn the meta data into a names vector(not shown)
# One can pull multiple values from the data frame at any time
head(x = set1.obj@meta.data[c('nCount_RNA', 'nFeature_RNA')])2.4 Add Meta Data
Organize meta data used later:
* Provided sample info
set1.s1 <- read.table("../data/sample.data/GSE125449_Set1_samples.txt",fill = TRUE,header = TRUE)
set1.s1 <- set1.s1[,-4]
colnames(set1.s1)[3] <- "Type"
head(set1.s1)- Organized according to the information provided in paper
library(stringr)
set1.s2 <- read.csv("../data/sample.data/set1Sample.CSV",header = F)
colnames(set1.s2) <- c("GSM Info", "Sample", "Clinical Info","Classification")
head(set1.s2)Merge info above:
set1.meta <- merge(set1.s1,set1.s2,by = "Sample")
rownames(set1.meta) <- set1.meta[,2]
set1.meta <- set1.meta[,-2]
head(set1.meta)Make sure rowname of meta data is consistant with colname of counts
identical(rownames(set1.meta),colnames(set1.obj@assays$RNA@counts))## [1] TRUE
Add meta data
set1.obj <- AddMetaData(object = set1.obj, metadata = set1.meta)
head(set1.obj@meta.data)3. Quality Control
While the CreateSeuratObject imposes a basic minimum gene-cutoff, then we need to filter out cells at this stage based on technical or biological parameters.
3.1 Remove Doublets
Doublets/Mulitples of cells in the same well/droplet is a common issue in scRNAseq protocols. Especially in droplet-based methods whith overloading of cells. In a typical 10x experiment the proportion of doublets is linearly dependent on the amount of loaded cells. As indicated from the Chromium user guide, doublet rates are about as follows:
Most doublet detectors simulates doublets by merging cell counts and predicts doublets as cells that have similar embeddings as the simulated doublets. Most such packages need an assumption about the number/proportion of expected doublets in the dataset. The data you are using is subsampled, but the orignial datasets contained about 5 000 cells per sample, hence we can assume that they loaded about 9 000 cells and should have a doublet rate at about 4%. OBS! Ideally doublet prediction should be run on each sample separately, especially if your different samples have different proportions of celltypes.
Here, we will use DoubletFinder to predict doublet cells. But before doing doublet detection we need to run scaling, variable gene selection and pca, as well as UMAP for visualization.
Github-DoubletFinder
DoubletFinder is an R package that predicts doublets in single-cell RNA sequencing data.
DoubletFinder can be broken up into 4 steps: 1. Generate artificial doublets from existing scRNA-seq data
2. Pre-process merged real-artificial data
3. Perform PCA and use the PC distance matrix to find each cell’s proportion of artificial k nearest neighbors (pANN)
4. Rank order and threshold pANN values according to the expected number of doublets
library(DoubletFinder)
library(patchwork)
library(sctransform)3.1.1 Pre-process Seurat object (sctransform)
set1.obj <- SCTransform(set1.obj, verbose = F) %>% RunPCA(verbose = F)ElbowPlot(set1.obj)pc.num=1:20
set1.obj <- RunUMAP(set1.obj, dims=pc.num,verbose = F)3.1.2 Optimize the parameters
To optimize the parameters, you can run the paramSweep function in the package.
Use SCTransform!!!!Consume lots of CPU!!!!!
# ## parameter optimization, look for the best pK
# sweep.res.list <- paramSweep_v3(set1.obj, PCs = pc.num, sct = T,num.cores = 1)
# # Use log transform
# sweep.stats <- summarizeSweep(sweep.res.list, GT = F)
# # Show the best parameter
# bcmvn <- find.pK(sweep.stats)
# # Extract the best pK
# pK_bcmvn <- bcmvn$pK[which.max(bcmvn$BCmetric)] %>% as.character() %>% as.numeric()
# save(pK_bcmvn, file = "../output/3.1 Remove Doublets/pk_bcmvn.RData")load("../data/RemoveDoublets/pk_bcmvn.RData")3.1.3 Run DoubletFinder
## Exclude doublets
# DoubletRate is set according to the table 10X provided
DoubletRate = 0.023
# Estimate the percentage of homotypic doublets
homotypic.prop <- modelHomotypic(set1.obj$Type)
nExp_poi <- round(DoubletRate*ncol(set1.obj))
# Adjust for homotypic doublets
nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))## Run DoubletFinder with varying classification stringencies
set1.obj <- doubletFinder_v3(set1.obj, PCs = pc.num, pN = 0.25, pK = pK_bcmvn,
nExp = nExp_poi.adj, reuse.pANN = F, sct = T)## Present the res, classification info is saved in set1.obj@meta.data
DF.name <- colnames(set1.obj@meta.data)[grepl("DF.classification", colnames(set1.obj@meta.data))]cowplot::plot_grid(ncol = 2, DimPlot(set1.obj, group.by = "Clinical.Info") + NoAxes(), DimPlot(set1.obj, group.by = DF.name) + NoAxes())We should expect that two cells have more detected genes than a single cell, lets check if our predicted doublets also have more detected genes in general.
VlnPlot(set1.obj, features = "nFeature_RNA", group.by = DF.name, pt.size = 0.1)Now, lets remove all predicted doublets from our data.
set1.obj <- set1.obj[, set1.obj@meta.data[, DF.name] == "Singlet"]
set1.obj## An object of class Seurat
## 40232 features across 5017 samples within 2 assays
## Active assay: SCT (20108 features, 3000 variable features)
## 1 other assay present: RNA
## 2 dimensional reductions calculated: pca, umap
3.2 Calculate QC
Having the data in a suitable format, we can start calculating some quality metrics. We can for example calculate the percentage of mitocondrial and ribosomal genes per cell and add to the metadata. This will be helpfull to visualize them across different metadata parameteres (i.e. datasetID and chemistry version).
A few QC metrics commonly used by the community include:
The number of unique genes detected in each cell.
- Low-quality cells or empty droplets will often have very few genes
- Cell doublets or multiplets may exhibit an aberrantly high gene count
- Low-quality cells or empty droplets will often have very few genes
The total number of molecules detected within a cell (correlates strongly with unique genes)
The percentage of reads that map to the mitochondrial genome
- Low-quality / dying cells often exhibit extensive mitochondrial contamination
- Calculate mitochondrial QC metrics with the
PercentageFeatureSet()function, which calculates the percentage of counts originating from a set of features
- Use the set of all genes starting with MT- as a set of mitochondrial genes
- Low-quality / dying cells often exhibit extensive mitochondrial contamination
The Seurat object initialization step above only considered cells that expressed at least 200 genes. Additionally, we would like to exclude cells that are damaged.
A common metric to judge this is the relative expression of mitochondrially derived genes. When the cells apoptose due to stress, their mitochondria becomes leaky and there is widespread RNA degradation. Thus a relative enrichment of mitochondrially derived genes can be a tell-tale sign of cell stress.
Citing from “Simple Single Cell” workflows (Lun, McCarthy & Marioni, 2017): “High proportions are indicative of poor-quality cells (Islam et al. 2014; Ilicic et al. 2016), possibly because of loss of cytoplasmic RNA from perforated cells. The reasoning is that mitochondria are larger than individual transcript molecules and less likely to escape through tears in the cell membrane.”
Here, we compute the proportion of transcripts that are of mitochondrial origin for every cell (PercentageFeatureSet adds percent.mt columns to object), and visualize its distribution as a violin plot. We also use the FeatureScatter function to observe how percent.mt correlates with other metrics.
# The [[ operator can add columns to object metadata.
set1.obj[["percent.mt"]] <- PercentageFeatureSet(object = set1.obj, pattern = "^MT-")In the same manner we will calculate the proportion gene expression that comes from ribosomal proteins.
# Way1: Doing it using Seurat function
set1.obj[["percent.ribo"]] <- PercentageFeatureSet(object = set1.obj, pattern = "^RP[SL]")
# Way2: Doing it manually
# ribo_genes <- rownames(alldata)[grep("^RP[SL]", rownames(alldata))]
# head(ribo_genes, 10)
# alldata$percent_ribo <- colSums(alldata@assays$RNA@counts[ribo_genes, ])/total_counts_per_cell
## [1] "RPL22" "RPL11" "RPS6KA1" "RPS8" "RPL5" "RPS27" "RPS6KC1"
## [8] "RPS7" "RPS27A" "RPL31"And finally, with the same method we will calculate proportion hemoglobin genes, which can give an indication of red blood cell contamination.
# Percentage hemoglobin genes - includes all genes starting with HB except HBP.
set1.obj[["percent.hb"]] <- PercentageFeatureSet(object = set1.obj, pattern = "^HB[^(P)]")PercentageFeatureSet have added columns to object@meta.data
head(set1.obj@meta.data)3.2 Plot QC
Now we can plot some of the QC-features as violin plots.
feats <- c("nFeature_RNA", "nCount_RNA", "percent.mt", "percent.ribo", "percent.hb")
VlnPlot(set1.obj, group.by = "Clinical.Info", features = feats, pt.size = 0.1, ncol = 3) + NoLegend()As you can see, there is quite some difference in quality for the 4 datasets, with for instance the H38 sample having fewer cells with many detected genes and more mitochondrial content. As the ribosomal proteins are highly expressed they will make up a larger proportion of the transcriptional landscape when fewer of the lowly expressed genes are detected. And we can plot the different QC-measures as scatter plots.
FeatureScatter(set1.obj, "nCount_RNA", "nFeature_RNA", group.by = "Clinical.Info", pt.size = 0.5)3.3 Filtering
3.3.1 Detection-based filtering
A standard approach is to filter cells with low amount of reads as well as genes that are present in at least a certain amount of cells. Here we will only consider cells with at least 200 detected genes and genes need to be expressed in at least 3 cells. Please note that those values are highly dependent on the library preparation method used.
selected_c <- WhichCells(set1.obj, expression = nFeature_RNA > 200)
selected_f <- rownames(set1.obj)[Matrix::rowSums(set1.obj) > 3]
set1.obj <- subset(set1.obj, features = selected_f, cells = selected_c)
set1.obj## An object of class Seurat
## 39522 features across 5017 samples within 2 assays
## Active assay: SCT (19761 features, 2995 variable features)
## 1 other assay present: RNA
## 2 dimensional reductions calculated: pca, umap
We can also see which genes contribute the most to such reads. We can for instance plot the percentage of counts per gene.
# Compute the relative expression of each gene per cell Use sparse matrix
# operations, if your dataset is large, doing matrix devisions the regular way
# will take a very long time.
par(mar = c(4, 8, 2, 1))
C <- set1.obj@assays$RNA@counts
C <- Matrix::t(Matrix::t(C)/Matrix::colSums(C)) * 100
most_expressed <- order(apply(C, 1, median), decreasing = T)[20:1]
boxplot(as.matrix(t(C[most_expressed,])), cex = 0.1, las = 1, xlab = "% total count per cell", col = (scales::hue_pal())(20)[20:1], horizontal = TRUE)As you can see, MALAT1 constitutes up to 10% of the UMIs from a single cell and the other top genes are mitochondrial and ribosomal genes. It is quite common that nuclear lincRNAs have correlation with quality and mitochondrial reads, so high detection of MALAT1 may be a technical issue. Let us assemble some information about such genes, which are important for quality control and downstream filtering.
3.3.2 Mito/Ribo filtering
We also have quite a lot of cells with high proportion of mitochondrial and low proportion of ribosomal reads. It could be wise to remove those cells, if we have enough cells left after filtering.
Another option would be to either remove all mitochondrial reads from the dataset and hope that the remaining genes still have enough biological signal.
A third option would be to just regress out the percent_mito variable during scaling.
In this case we had as much as 99.7% mitochondrial reads in some of the cells, so it is quite unlikely that there is much cell type signature left in those.
Looking at the plots, make reasonable decisions on where to draw the cutoff. In this case, the bulk of the cells are below 20% mitochondrial reads and that will be used as a cutoff. We will also remove cells with less than 5% ribosomal reads.
selected_mito <- WhichCells(set1.obj, expression = percent.mt < 20)
selected_ribo <- WhichCells(set1.obj, expression = percent.ribo > 5)
# and subset the object to only keep those cells
set1.obj <- subset(set1.obj, cells = selected_mito)
set1.obj <- subset(set1.obj, cells = selected_ribo)
dim(set1.obj)## [1] 19761 4322
table(set1.obj$Clinical.Info)##
## C25 C26 C29 C35 C39 H18 H21 H23 H28 H30 H37 H38
## 131 222 891 125 392 112 691 120 115 478 103 942
We can also notice that the percent_ribo are also highly variable, but that is expected since different cell types have different proportions of ribosomal content, according to their function.
3.3.3 Plot filtered QC
Lets plot the same QC-stats another time.
feats <- c("nFeature_RNA", "nCount_RNA", "percent.mt", "percent.ribo", "percent.hb")
VlnPlot(set1.obj, group.by = "Clinical.Info", features = feats, pt.size = 0.1, ncol = 3) + NoLegend()4. Expression normalization(SCTransform)
SCTransform:
R package for normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
- Note that this single command replaces NormalizeData, ScaleData, and FindVariableFeatures
- Transformed data will be available in the SCT assay, which is set as the default after running sctransform.
- During normalization, we can also remove confounding sources of variation, for example, mitochondrial mapping percentage.
We suggest that users set these parameters to mark visual outliers on the dispersion plot, but the exact parameter settings may vary based on the data type, heterogeneity in the sample, and normalization strategy. The parameters here identify ~3,000 variable genes, and represent typical parameter settings for UMI data that is normalized to a total of 1e4 molecules.
# run sctransform
set1.obj <- SCTransform(set1.obj, vars.to.regress = "percent.mt", verbose = FALSE, seed.use = 2021)
set1.obj## An object of class Seurat
## 39326 features across 4322 samples within 2 assays
## Active assay: SCT (19565 features, 3000 variable features)
## 1 other assay present: RNA
## 2 dimensional reductions calculated: pca, umap
5. Dimensionality reduction
5.1 Principal components analysis
Run pca on the just the variable features found.
set1.obj <- RunPCA(object = set1.obj,
ndims.print = 1:5,
nfeatures.print = 5)## PC_ 1
## Positive: IGFBP7, RGS5, TAGLN, ACTA2, SPARCL1
## Negative: FTL, APOC1, AMBP, SERPINA1, ALB
## PC_ 2
## Positive: ALB, RBP4, AMBP, APOC3, APOA2
## Negative: HLA-DRA, CD74, HLA-DRB1, HLA-DPB1, HLA-DPA1
## PC_ 3
## Positive: HLA-DRA, HLA-DRB1, TYROBP, CD74, HLA-DPB1
## Negative: CCL5, KLRB1, IGKC, CD3D, IGHG4
## PC_ 4
## Positive: TAGLN, ACTA2, MYL9, TPM2, RGS5
## Negative: RAMP2, PLVAP, IFI27, RBP7, PLPP1
## PC_ 5
## Positive: CCL5, RGS5, CXCR4, SPARCL1, APOC3
## Negative: SPP1, SLPI, S100A6, GPX2, ELF3
Seurat v3 provides functions for visualizing:
- PCA - PCA plot coloured by a quantitative feature
- Scatter plot across single cells and individual features
- Variable Feature Plot
- Violin and Ridge plots
- Heatmaps
Visualizing PCA in DimPlot
DimPlot(object = set1.obj, reduction = "pca",group.by = "Clinical.Info")Visualizing PCA in FeaturePlot
# Dimensional reduction plot, with cells colored by a quantitative feature
FeaturePlot(object = set1.obj, features = "CD3G")Visualizing PCA in ScatterPlot
# Scatter plot across single cells, replaces GenePlot
FeatureScatter(object = set1.obj, feature1 = "CD3G", feature2 = "PC_1", group.by = "Clinical.Info")Visualizing PCA in Heatmap
In particular DimHeatmap allows for easy exploration of the primary sources of heterogeneity in a dataset, and can be useful when trying to decide which PCs to include for further downstream analyses. Both cells and genes are ordered according to their PCA scores.
# Heatmaps
DimHeatmap(object = set1.obj, reduction = "pca", cells = 150, balanced = TRUE)5.2 Determine statistically significant principal components
We can use ElbowPlot to identify significant PCs.
ElbowPlot(object = set1.obj)5.3 Cluster the cells
First calculate k-nearest neighbors and construct the SNN graph (FindNeighbors), then run FindClusters.
set1.obj <- FindNeighbors(set1.obj, reduction = "pca", dims = 1:20)Check resolution
library(clustree)
checkRes <- FindClusters(object = set1.obj, resolution = c(seq(.1,1.6,.2)), verbose = F)
clustree(checkRes@meta.data,prefix = "SCT_snn_res.")set1.obj <- FindClusters(object = set1.obj,
reduction = "pca",
dims = 1:20,
resolution = 0.5,
random.seed = 2021)## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 4322
## Number of edges: 128859
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9289
## Number of communities: 17
## Elapsed time: 0 seconds
DimPlot(set1.obj, label = TRUE) + NoLegend() 5.4 Run non-linear dimensional reduction (UMAP/tSNE)
Seurat continues to use tSNE as a powerful tool to visualize and explore these datasets.
The goal of these algorithms is to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space. Cells within the graph-based clusters determined above should co-localize on these dimension reduction plots. As input to the UMAP and tSNE, we suggest using the same PCs as input to the clustering analysis.
5.4.1 tSNE
set1.obj <- RunTSNE(object = set1.obj, dims.use = 1:20, do.fast = TRUE)Figure 1B >With the linearly uncorrelated principal components (PCs) (k = 20), we performed t-distributed stochastic neighbor embedding (t-SNE) analysis (Figure 1B), which is a technique well-suited for visualization of high-dimensional data in a two-dimensional space.
# note that you can set do.label=T to help label individual clusters
DimPlot(object = set1.obj, group.by = "Clinical.Info",reduction = "tsne")Compared with Figure 1B in paper:
Caption: t-SNE plot of all the 5,115 single cells from 12 primary liver cancer patients (indicated by colors). Case ID was named according to histological subtypes of liver cancer, with H and C representing HCC and iCCA, respectively.
Figure 1C
We found some cells with high expression levels of epithelial marker genes, suggesting a potential tumoral origin (Figure 1C).
FeaturePlot(object = set1.obj, features = c("EPCAM"), cols = c("grey", "blue"), reduction = "tsne")Compared with Figure 1C in paper:
Caption: t-SNE plot of all the single cells colored by epithelial score. Epithelial score was determined by the average expression of epithelial marker genes.
5.4.2 UMAP
To visualize the two conditions side-by-side, we can use the split.by argument to show each condition colored by cluster.
set1.obj <- RunUMAP(set1.obj, reduction = "pca", dims = 1:20)
DimPlot(set1.obj, reduction = "umap", group.by = "Clinical.Info")6. Find differentially expressed genes (cluster biomarkers)
Seurat can help you find markers that define clusters via differential expression. By default, it identifes positive and negative markers of a single cluster (specified in ident.1), compared to all other cells. FindAllMarkers automates this process for all clusters, but you can also test groups of clusters vs. each other, or against all cells.
The min.pct argument requires a gene to be detected at a minimum percentage in either of the two groups of cells, and the thresh.test argument requires a gene to be differentially expressed (on average) by some amount between the two groups.
As another option to speed up these computations, max.cells.per.ident can be set. This will downsample each identity class to have no more cells than whatever this is set to. While there is generally going to be a loss in power, the speed increases can be significiant and the most highly differentially expressed genes will likely still rise to the top.
# find all markers of cluster 1
cluster1.markers <- FindMarkers(object = set1.obj, ident.1 = 0, min.pct = 0.25)
print(x = head(x = cluster1.markers, n = 5))## p_val avg_log2FC pct.1 pct.2 p_val_adj
## IL7R 0.000000e+00 1.459128 0.689 0.088 0.000000e+00
## LTB 0.000000e+00 2.136885 0.838 0.155 0.000000e+00
## CD52 7.171388e-258 1.613516 0.925 0.290 1.403082e-253
## RPS29 2.211157e-241 1.463332 1.000 0.965 4.326128e-237
## TRAC 6.189094e-238 1.309352 0.743 0.158 1.210896e-233
# find markers for every cluster compared to all remaining cells, report
# only the positive ones
# job::job(find_markers= { markers <- FindAllMarkers(object = set1.obj, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
# })
markers <- FindAllMarkers(object = set1.obj, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)markers %>% group_by(cluster) %>% top_n(1, avg_log2FC)The ROC test returns the ‘classification power’ for any individual marker (ranging from 0 - random, to 1 - perfect).
cluster1.markers <- FindMarkers(object = set1.obj, ident.1 = 0, thresh.use = 0.25, test.use = "roc", only.pos = TRUE)top5 <- markers %>% group_by(cluster) %>% top_n(5, avg_log2FC)
# setting slim.col.label to TRUE will print just the cluster IDS instead of
# every cell name
DoHeatmap(object = set1.obj, features = top5$gene, label = TRUE,group.by = "seurat_clusters")FeaturePlot(object = set1.obj, features = c("CD3G", "CD3E", "CD3D", "CD79A", "FCRL5", "PECAM1", "VWF", "COL1A2", "CD14"), cols = c("grey", "blue"), reduction = "tsne")7. Cell type annotation(SingleR)
This identifies marker genes from the reference and uses them to compute assignment scores (based on the Spearman correlation across markers) for each cell in the test dataset against each label in the reference. The label with the highest score is the assigned to the test cell, possibly with further fine-tuning to resolve closely related labels.
Each row of the output DataFrame contains prediction results for a single cell. Labels are shown before fine-tuning (first.labels), after fine-tuning (labels) and after pruning (pruned.labels), along with the associated scores.
library(SingleR)
library(celldex)
library(scater)
# hpca.se <- HumanPrimaryCellAtlasData()
# save(hpca.se,file="../data/SingleR/hpca.se.RData")The easiest way to use SingleR is to annotate cells against built-in references. In particular, the celldex package provides access to several reference datasets (mostly derived from bulk RNA-seq or microarray data) through dedicated retrieval functions.
set1.test <- as.data.frame(set1.obj[["RNA"]]@counts)
# load the ref and choose intersection
load("../data/SingleR/hpca.se.RData")
common_hpca <- intersect(rownames(set1.test), rownames(hpca.se))
hpca.se <- hpca.se[common_hpca,]
hpca.se## class: SummarizedExperiment
## dim: 15274 713
## metadata(0):
## assays(1): logcounts
## rownames(15274): SAMD11 NOC2L ... S100B PRMT2
## rowData names(0):
## colnames(713): GSM112490 GSM112491 ... GSM92233 GSM92234
## colData names(3): label.main label.fine label.ont
set1_forhpca.se <- SummarizedExperiment(assays=list(counts=set1.test[common_hpca,])) %>% logNormCounts()
set1_forhpca.se[["seurat_cluster"]] <- set1.obj$seurat_clusters
set1_forhpca.se## class: SummarizedExperiment
## dim: 15274 4322
## metadata(0):
## assays(2): counts logcounts
## rownames(15274): SAMD11 NOC2L ... S100B PRMT2
## rowData names(0):
## colnames(4322): AAACCTGAGGCGTACA-1 AAACGGGAGATCGATA-1 ...
## TTTATGCTCCTTAATC-13 TTTGTCAGTTTGGGCC-13
## colData names(1): seurat_cluster
# 然后,我们使用SingleR()函数进行细胞类型注释分析,
# 但要使用标记检测模式(marker detection mode),
# 该模式会考虑跨细胞表达的差异。在这里,使用Wilcoxon ranked sum test检验
# 来识别标签对之间比较的top markers。与默认的标记检测算法相比,此方法会更慢,
pred.main.hpca <- SingleR(test = set1_forhpca.se, ref = hpca.se,
labels = hpca.se$label.main,de.method="wilcox",
clusters = set1_forhpca.se$seurat_cluster)
table(pred.main.hpca$pruned.labels)##
## B_cell DC Endothelial_cells Epithelial_cells
## 1 1 4 2
## Hepatocytes Macrophage MSC Smooth_muscle_cells
## 2 1 1 1
## T_cells
## 2
result_main_hpca <- as.data.frame(pred.main.hpca$labels)
result_main_hpca$CB <- rownames(pred.main.hpca)
colnames(result_main_hpca) <- c('HPCA_Main', 'seurat_clusters')
write.table(result_main_hpca, file = "../output/7.Annotation/HPCA_Main.txt", sep = '\t', row.names = FALSE, col.names = TRUE, quote = FALSE)
head(result_main_hpca)We inspect the results using a heatmap of the per-cell and label scores. Ideally, each cell should exhibit a high score in one label relative to all of the others, indicating that the assignment to that label was unambiguous.
plotScoreHeatmap(pred.main.hpca)Add to set1.obj metadata
set1.obj@meta.data$CB <- rownames(set1.obj@meta.data)
set1.obj@meta.data <- merge(set1.obj@meta.data,result_main_hpca,by="seurat_clusters")
rownames(set1.obj@meta.data) <- set1.obj@meta.data$CB
head(set1.obj@meta.data)Examine the result of annotation.
p5 <- DimPlot(set1.obj, reduction = "tsne", group.by = "HPCA_Main", pt.size=0.5)+theme(
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
p6 <- DimPlot(set1.obj, reduction = "tsne", group.by = "ident", pt.size=0.5, label = TRUE,repel = TRUE)+theme(
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
plot_grid(p6, p5,rel_widths = c(2,2))| Conparison between SingleR annotation and the annotation from paper |
r p7 <- DimPlot(set1.obj, reduction = "tsne", group.by = "HPCA_Main", pt.size=0.5, label = TRUE,repel = TRUE)+theme( axis.line = element_blank(), axis.ticks = element_blank(),axis.text = element_blank() ) p8 <- DimPlot(set1.obj, reduction = "tsne", group.by = "Type", pt.size=0.5, label = TRUE,repel = TRUE)+theme( axis.line = element_blank(), axis.ticks = element_blank(),axis.text = element_blank() ) plot_grid(p7, p8, rel_widths = c(2,2)) |
8. Inferring copy number alterations
InferCNV is used to explore tumor single cell RNA-Seq data to identify evidence for somatic large-scale chromosomal copy number alterations, such as gains or deletions of entire chromosomes or large segments of chromosomes. This is done by exploring expression intensity of genes across positions of tumor genome in comparison to a set of reference ‘normal’ cells.
library(infercnv)
library(AnnoProbe)8.1 Preparing input data
Required input files:
* Raw Counts Matrix for Genes x Cells
* Sample annotation file
* Gene ordering file
8.1.3 Raw Counts Matrix for Genes x Cells
dgCMatrix to data.frame
@Dim: dimention information;
@Dimnames: rownames and colnames; @x: non-zero
@i: non-zero index which is consistent with @x
dat <- set1.obj@assays$SCT@counts ## dgCMatrix
dgc_as_matrix <- function(mat){
tmp <- matrix(data=0L, nrow = mat@Dim[1], ncol = mat@Dim[2])
row_pos <- mat@i+1
col_pos <- findInterval(seq(mat@x)-1,mat@p[-1])+1
val <- mat@x
for (i in seq_along(val)){
tmp[row_pos[i],col_pos[i]] <- val[i]
}
row.names(tmp) <- mat@Dimnames[[1]]
colnames(tmp) <- mat@Dimnames[[2]]
return(tmp)
}
dat <- dgc_as_matrix(dat)
dat <- as.data.frame(dat)
head(dat)8.1.3 Gene ordering file
geneInfor <- annoGene(rownames(dat),"SYMBOL",'human')
head(geneInfor)geneInfor <- geneInfor[!duplicated(geneInfor[,1]),]
length(unique(geneInfor[,1]))## [1] 16150
geneInfor <- geneInfor[,-c(2,3)]
write.table(geneInfor,file = "../data/inferCNV/geneInfo.txt",row.names = F,sep = "\t",col.names = F,quote = F)rawexpr <- dat
rawexpr <- rawexpr[intersect(row.names(rawexpr),geneInfor$SYMBOL),]
head(rawexpr)rawexpr <- as.matrix(rawexpr)
# write.table(rawexpr,file = "../data/inferCNV/rawexpr.txt",row.names = F,sep = "\t",col.names = F,quote = F)8.1.2 Sample annotation file
cellAnno <- set1.obj@meta.data[,c("CB","Type")]
head(cellAnno)write.table(cellAnno,file = "../data/inferCNV/cellAnno.txt",row.names = FALSE,col.names = FALSE,sep = "\t")8.2 Analysis
8.2.1 Create InferCNV object
The most complicated task is to prepare the input files, and once the above three files have been created, then the analysis only needs two steps and the parameters are adjusted according to the results.
The first step is to create based on the above three files inferCNVObject
infercnv_obj <- CreateInfercnvObject(raw_counts_matrix = rawexpr,
annotations_file = "../data/inferCNV/cellAnno.txt",
delim = "\t",
gene_order_file = "../data/inferCNV/geneInfo.txt",
ref_group_names = NULL)A key parameter in this step is ref_group_name. Used to set the reference group. If you don’t know which group is normal and which group is abnormal, then set to ref_group_name=NULL, Then inferCNV The global average will be used as the baseline, which is suitable for situations where there are enough cells to differ. In addition, here’s raw_count_matrix. It is a count matrix that excludes low-quality cells.
8.2.2 Running inferCNV
The second step is to run the standard inferCNV Process.
# perform infercnv operations to reveal cnv signal
infercnv_obj <- infercnv::run(infercnv_obj,
cutoff=0.1, # use 1 for smart-seq, 0.1 for 10x-genomics
out_dir="../data/inferCNV/results/", # output folder
cluster_by_groups=F, # cluster
denoise=T, # denoise
HMM=T) # Whether to predict CNV based on HMMcutoff: inferCNV suggests that 10X data set up to 0.1 and smart-seq set as 1;
cluster_by_groups: used to cluster tumor cell according to cell annotation files.
The key parameter is cutoff, Used to select which genes will be used for analysis (the average expression level in all cells needs to be greater than a certain threshold). This needs to be calculated according to the specific sequencing depth. The official tutorial recommends setting 10X to 0.1 and smart-seq to 1. You can first evaluate the number of retained genes under different thresholds to determine the specific value. cluster_by_groups used to declare whether to group tumor cells according to the grouping of the cell annotation file.
This will eventually output a lot of files inout_dirIn the directory, but actually useful are the following:
- infercnv.preliminary.png: preliminary inferCNV display results (without denoising or HMM prediction)
- infercnv.png: The denoising heat map generated by the final inferCNV.
- infercnv.references.txt: Normal cell matrix.
- infercnv.observations.txt: Tumor cell matrix.
- infercnv.observation_groupings.txt: The grouping relationship of tumor cells after clustering.
- infercnv.observations_dendrogram.txt: NEWICK format, showing the hierarchical relationship between cells.
8.3 Classify maligant and non-maligant cells
8.3.1 Define CNV score
Use infercnv.observations.txt files to calculate CNV score
cnv_table <- read.table("../data/inferCNV/results/infercnv.observations.txt", header=T)
# Score cells based on their CNV scores
# Replicate the table
cnv_score_table <- as.matrix(cnv_table)
cnv_score_mat <- as.matrix(cnv_table)
cnv_score_mat[1:3,1:4]## GAACGGACAAATCCGT.4 CCACTACAGTAGCGGT.4 TGGGAAGGTGCCTTGG.4
## NOC2L 0.8969204 1.001434 1.001434
## ISG15 0.8968755 1.001434 1.001434
## AGRN 0.8976021 1.001434 1.001434
## TTGACTTTCCGAAGAG.4
## NOC2L 1.001434
## ISG15 1.001434
## AGRN 1.001434
Malignant cells were defined as those with CNV score above 80th percentile of all CNV scores and CNV correlation score above 0.4. Confidently defined non-malignant cells were those below the two cutoffs. The rest of the cells were also considered as non-malignant cells in this study.
# Scoring
# cnv_score_table[cnv_score_mat > 0 & cnv_score_mat < 0.3] <- "A" #complete loss. 2pts
# cnv_score_table[cnv_score_mat >= 0.3 & cnv_score_mat < 0.7] <- "B" #loss of one copy. 1pts
# cnv_score_table[cnv_score_mat >= 0.7 & cnv_score_mat < 1.3] <- "C" #Neutral. 0pts
# cnv_score_table[cnv_score_mat >= 1.3 & cnv_score_mat <= 1.5] <- "D" #addition of one copy. 1pts
# cnv_score_table[cnv_score_mat > 1.5 & cnv_score_mat <= 2] <- "E" #addition of two copies. 2pts
# cnv_score_table[cnv_score_mat > 2] <- "F" #addition of more than two copies. 2pts
# Check
colnames(cnv_score_table) <- gsub("\\.","\\-",colnames(cnv_score_table))
cnv_score_table[1:3,1:4]## GAACGGACAAATCCGT-4 CCACTACAGTAGCGGT-4 TGGGAAGGTGCCTTGG-4
## NOC2L 0.8969204 1.001434 1.001434
## ISG15 0.8968755 1.001434 1.001434
## AGRN 0.8976021 1.001434 1.001434
## TTGACTTTCCGAAGAG-4
## NOC2L 1.001434
## ISG15 1.001434
## AGRN 1.001434
# Replace with score
cnv_score_table_pts <- cnv_table
rm(cnv_score_mat)
# Scores are stored in “cnv_score_table_pts”. Use colSums to add up scores for each cell and store as vector
cell_scores_CNV <- as.data.frame(colSums(cnv_score_table_pts))
colnames(cell_scores_CNV) <- "cnv_score"
cell_scores_CNV[cell_scores_CNV > quantile(cell_scores_CNV, probs = 0.8)] <- "Malignant"
cell_scores_CNV[cell_scores_CNV <= quantile(cell_scores_CNV, probs = 0.8)] <- "non-Malignant"
rownames(cell_scores_CNV) <- gsub("\\.","\\-",rownames(cell_scores_CNV))
write.csv(x = cell_scores_CNV, file = "../data/inferCNV/cnv_scores.csv")load("../data/inferCNV/cell-score.RData")
table(cell_scores_CNV)## cell_scores_CNV
## Malignant non-Malignant
## 1497 2825
8.3.2 Add metadata
Make sure rowname of meta data is consistant with colname of counts
identical(rownames(cell_scores_CNV),colnames(set1.obj@assays$RNA@counts))## [1] FALSE
Add meta data
set1.obj <- AddMetaData(object = set1.obj, metadata = cell_scores_CNV)
head(set1.obj@meta.data)8.4 Visualization
Difference between malignant and non-malignant cells
# subset the object to only keep those cells
malignant.obj <- subset(set1.obj, cells = WhichCells(set1.obj, expression = cnv_score == "Malignant"))
nonmalignant.obj <- subset(set1.obj, cells = WhichCells(set1.obj, expression = cnv_score == "non-Malignant"))group.by = "Clinical.Info"
p9 <- DimPlot(malignant.obj, reduction = "tsne", group.by = "Clinical.Info", pt.size=0.5, label = TRUE,repel = TRUE) + labs(title="Malignant Cells")+
theme(
plot.title = element_text(hjust = 0.5),
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
p10 <- DimPlot(nonmalignant.obj, reduction = "tsne", group.by = "Clinical.Info", pt.size=0.5, label = TRUE,repel = TRUE) + labs(title="non-Malignant Cells") +
theme(
plot.title = element_text(hjust = 0.5),
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
plot_grid(p9, p10, rel_widths = c(2,2))group.by = "Type"
p11 <- DimPlot(malignant.obj, reduction = "tsne", group.by = "Type", pt.size=0.5, label = TRUE,repel = TRUE) + labs(title="Malignant Cells")+
theme(
plot.title = element_text(hjust = 0.5),
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
p12 <- DimPlot(nonmalignant.obj, reduction = "tsne", group.by = "Type", pt.size=0.5, label = TRUE,repel = TRUE) + labs(title="non-Malignant Cells") +
theme(
plot.title = element_text(hjust = 0.5),
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
plot_grid(p11, p12, rel_widths = c(2,2))group.by = "HPCA_Main"
p13 <- DimPlot(malignant.obj, reduction = "tsne", group.by = "HPCA_Main", pt.size=0.5, label = TRUE,repel = TRUE) + labs(title="Malignant Cells")+
theme(
plot.title = element_text(hjust = 0.5),
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
p14 <- DimPlot(nonmalignant.obj, reduction = "tsne", group.by = "HPCA_Main", pt.size=0.5, label = TRUE,repel = TRUE) + labs(title="non-Malignant Cells") +
theme(
plot.title = element_text(hjust = 0.5),
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
plot_grid(p13, p14, rel_widths = c(2,2))- Number of malignant cells
make input file1
dat <- set1.obj@meta.data[,c("Clinical.Info","cnv_score")]
dat$pecent.ma <- 1
dat$Clinical.Info <- as.factor(dat$Clinical.Info)
dat$cnv_score <- as.factor(dat$cnv_score)
dat.bind <- aggregate(dat$pecent.ma, by = list(grp.sample = dat$Clinical.Info, grp.status = dat$cnv_score),sum)
total <- aggregate(dat.bind$x, by = list(grp.sample = dat.bind$grp.sample),sum)
dat.bind <- merge(dat.bind,total,all.x = T, by = "grp.sample")
dat.bind$pecent <- dat.bind$x.x/dat.bind$x.y
dat.bind$grp.status <- factor(dat.bind$grp.status, levels = c("non-Malignant","Malignant"))
head(dat.bind)library(ggsci)
ggplot(dat.bind,aes(grp.sample,pecent,fill=grp.status))+
geom_bar(stat="identity",position="fill")+
ggtitle("Number of malignant cells")+
scale_fill_simpsons(alpha = 0.9)+
theme(axis.ticks.length=unit(0.5,'cm'))+
guides(fill=guide_legend(title=NULL))Make input file2
dat <- nonmalignant.obj@meta.data[,c("Clinical.Info","Type")]
dat$pecent <- 1
dat$Clinical.Info <- as.factor(dat$Clinical.Info)
dat$Type <- as.factor(dat$Type)
dat.bind <- aggregate(dat$pecent, by = list(grp.sample = dat$Clinical.Info, grp.cell = dat$Type),sum)
total <- aggregate(dat.bind$x, by = list(grp.cell = dat.bind$grp.cell),sum)
dat.bind <- merge(dat.bind,total,all.x = T, by = "grp.cell")
dat.bind$pecent <- dat.bind$x.x/dat.bind$x.y
head(dat.bind)library(ggsci)
library(ggplot2)
dat.bind$grp.cell <- factor(dat.bind$grp.cell)
ggplot(dat.bind,aes(grp.sample,pecent,fill=grp.cell))+
geom_bar(stat="identity",position="fill")+
ggtitle("Number of malignant cells")+
scale_fill_simpsons(alpha = 0.9)+
theme(axis.ticks.length=unit(0.5,'cm'))+
guides(fill=guide_legend(title=NULL))9. Cell trajectory
The “pseudotime” is defined as the positioning of cells along the trajectory that quantifies the relative activity or progression of the underlying biological process. The most common application is to fit models to gene expression against the pseudotime to identify the genes responsible for generating the trajectory in the first place, especially around interesting branch events.
library(monocle3)
library(tidyverse)
library(patchwork)
library(harmony)Monocle, from the Trapnell Lab, is a piece of the TopHat suite that performs differential expression, trajectory, and pseudotime analyses on single cell RNA-Seq data.
9.1 Create a Monocle CDS Object
- Expression matrix
we first examined the cell trajectory of malignant cells for each tumor by using a reversed graph embedding method.
# Create an expression matrix
expression_matrix <- malignant.obj@assays$RNA@counts
expression_matrix_df <- malignant.obj@assays$RNA@counts %>% dgc_as_matrix() %>% as.data.frame()
head(expression_matrix_df)- Cell metadata
# Get cell metadata
cell_metadata <- malignant.obj@meta.data
head(cell_metadata)cell_metadata_sort <- data.frame(matrix(NA, nrow(cell_metadata)),2)
cell_metadata_sort$CB <- colnames(expression_matrix_df)
cell_metadata_sort <- plyr::join(cell_metadata_sort, cell_metadata, by = "CB")
cell_metadata_sort <- cell_metadata_sort[,-c(1,2)]
rownames(cell_metadata_sort) <- cell_metadata_sort$CBidentical(rownames(cell_metadata_sort), expression_matrix@Dimnames[[2]])## [1] TRUE
- Gene annotations
# get gene annotations
gene_annotation <- data.frame(gene_short_name = rownames(malignant.obj@assays$RNA), row.names = rownames(malignant.obj@assays$RNA))
head(gene_annotation)identical(rownames(gene_annotation), rownames(expression_matrix))## [1] TRUE
- Create Seurat-derived DCS
# Seurat-derived CDS
my.cds <- new_cell_data_set(expression_matrix,
cell_metadata = cell_metadata_sort,
gene_metadata = gene_annotation)
rm(cell_metadata, cell_metadata_sort, expression_matrix, expression_matrix_df, gene_annotation)9.2 Transfer Seurat embeddings
# preprocess_cds函数相当于seurat中NormalizeData+ScaleData+RunPCA
my.cds <- preprocess_cds(my.cds, num_dim = 30)
# umap降维
my.cds <- reduce_dimension(my.cds, preprocess_method = "PCA")
plot_cells(my.cds, reduction_method="PCA", color_cells_by="Type") + ggtitle('cds.pca')Import Seurat data
##从seurat导入整合过的umap坐标
my.cds@int_colData@listData$reducedDims$UMAP <- malignant.obj@reductions$umap@cell.embeddings
plot_cells(my.cds, reduction_method="UMAP", color_cells_by="Type") + ggtitle('int.umap')# Copy cluster info from Seurat
DF.name <- colnames(malignant.obj@meta.data)[grepl("DF.classification", colnames(malignant.obj@meta.data))]
my.cds@clusters$UMAP_so$clusters <- malignant.obj@meta.data$DF.name9.3 Run Monocle3
9.3.1 Cluster
## Monocle3
my.cds <- cluster_cells(my.cds)
plot_cells(my.cds, show_trajectory_graph = FALSE) + ggtitle("label by clusterID")plot_cells(my.cds, color_cells_by = "cluster", show_trajectory_graph = FALSE) + ggtitle("label by cluster")9.3.2 Recognize trajectory
## 识别轨迹
my.cds <- learn_graph(my.cds, use_partition = T, verbose = FALSE)##
|
| | 0%
|
|======================================================================| 100%
p <- plot_cells(my.cds, reduction_method = "UMAP", color_cells_by = "Type", group_cells_by = "cluster", label_groups_by_cluster = T, label_leaves = T, label_branch_points = FALSE, show_trajectory_graph = T)
p9.3.4 Sort by pseudotime
##细胞按拟时排序
# my.cds <- order_cells(my.cds)
# saveRDS(my.cds,"../data/celltrajectory/mycds.rds")
my.cds<-readRDS("../data/celltrajectory/mycds.rds")
# p + geom_vline(xintercept = seq(-7,-6,0.25)) + geom_hline(yintercept = seq(0,1,0.25))
# embed <- data.frame(Embeddings(set1.obj, reduction = "umap"))
# embed <- subset(embed, (UMAP_1 > -6.75 & UMAP_1 < -6.5) & (UMAP_2 > 0.24 & UMAP_2 < 0.26))
# root.cell <- rownames(embed)
# my.cds <- order_cells(my.cds, root_cells = root.cell)
plot_cells(my.cds, color_cells_by = "pseudotime", label_cell_groups = FALSE,
label_leaves = FALSE, label_branch_points = FALSE)9.3.5 Find DEGs
##寻找拟时轨迹差异基因
# graph_test分析最重要的结果是莫兰指数(morans_I),其值在-1至1之间,0代表此基因没有
# 空间共表达效应,1代表此基因在空间距离相近的细胞中表达值高度相似。
Track_genes <- graph_test(my.cds, neighbor_graph="knn", cores=24, verbose = FALSE)#挑选top10画图展示
Track_genes_sig <- Track_genes %>% top_n(n=10, morans_I) %>%
pull(gene_short_name) %>% as.character()#基因表达趋势图
plot_genes_in_pseudotime(my.cds[Track_genes_sig,], color_cells_by="Type",
min_expr=0.5, ncol = 2)#FeaturePlot图
plot_cells(my.cds, genes=Track_genes_sig, show_trajectory_graph=FALSE,
label_cell_groups=FALSE, label_leaves=FALSE)Find co-expression modules:
genelist <- pull(Track_genes, gene_short_name) %>% as.character()
gene_module <- find_gene_modules(my.cds[genelist,], resolution=1e-2, cores = 6)
cell_group <- tibble::tibble(cell=row.names(colData(my.cds)),
cell_group=colData(my.cds)$Clinical.Info)
agg_mat <- aggregate_gene_expression(my.cds, gene_module, cell_group)
row.names(agg_mat) <- stringr::str_c("Module ", row.names(agg_mat))
pheatmap::pheatmap(agg_mat, clustering_method="ward.D2")Intratumor transcriptomic biodiversity among tumors could be further revealed by correlation analysis of all malignant cells
Correlation files
tmp.data <- dgc_as_matrix(malignant.obj@assays$RNA@counts) %>% as.data.frame()
tmp.data <- cor(tmp.data)
tmp.data[1:4,1:3]## AAACCTGAGGCGTACA-1 AAACGGGAGATCGATA-1 AAAGCAAAGATCGGGT-1
## AAACCTGAGGCGTACA-1 1.0000000 0.9388450 0.5938179
## AAACGGGAGATCGATA-1 0.9388450 1.0000000 0.6240768
## AAAGCAAAGATCGGGT-1 0.5938179 0.6240768 1.0000000
## AACACGTCACGGCTAC-1 0.8920322 0.8632513 0.5566751
Annotation Files
annotation_row = data.frame(
Clinical = factor(malignant.obj$Classification),
Sample = factor(malignant.obj$Clinical.Info)
)
annotation_col = data.frame(
Clinical = factor(malignant.obj$Classification),
Sample = factor(malignant.obj$Clinical.Info)
)
head(annotation_col)ann_colors = list( Clinical = c(HCC = "#D87F23", iCCA = "#A5A5A5"), Sample = c(H18 = "#fad3cf", H21 = "#2d095c", H23 = "#20366b", H28 ="#dd7777", H30 = "#eae3e3", H37 = "#2E94B9", H38 = "#FFFDC0", C25 = "#F0B775", C26 = "#D25565", C29 = "#a39e9e", C35 = "#2470a0", C39 = "#a696c8"))Compare with the result in the paper
10. Cell-cell communication
CellChat: Inference and analysis of cell-cell communication GitHub-CellChat
- It is able to analyze cell-cell communication for continuous states along cellular development trajectories.
- It can quantitatively characterize and compare the inferred cell-cell communication networks using an integrated approach by combining social network analysis, pattern recognition, and manifold learning approaches.
- It provides an easy-to-use tool for extracting and visualizing high-order information of the inferred networks. For example, it allows ready prediction of major signaling inputs and outputs for all cell populations and how these populations and signals coordinate together for functions.
- It provides several visualization outputs to facilitate intuitive user-guided data interpretation.
library(CellChat)10.1 Create CellChat Object
Extract input files from Seurat object. The normalized count data and cell group information can be obtained from the Seurat object by
data.input <- GetAssayData(set1.obj, assay = "RNA", slot = "data") # normalized data matrix
labels <- set1.obj$Type
meta <- data.frame(group = labels, row.names = names(labels)) # create a dataframe of the cell labelsmy.cellchat <- createCellChat(object = data.input, meta = meta, group.by = "group")## The cell groups used for CellChat analysis are B CAF HPC-like Malignant T TAM TEC unclassified
10.2 Data preprocessing
CellChatDB <- CellChatDB.human # use CellChatDB.mouse if running on mouse data
showDatabaseCategory(CellChatDB)use a subset of CellChatDB for cell-cell communication analysis
CellChatDB.use <- subsetDB(CellChatDB, search = "Secreted Signaling")
# set the used database in the object
my.cellchat@DB <- CellChatDB.useTo infer the cell state-specific communications, we identify over-expressed ligands or receptors in one cell group and then identify over-expressed ligand-receptor interactions if either ligand or receptor is over-expressed.
# subset the expression data of signaling genes for saving computation cost
my.cellchat <- CellChat::subsetData(my.cellchat, features = NULL) # This step is necessary even if using the whole database
# future::plan("multiprocess", workers = 6) # do parallel
my.cellchat <- identifyOverExpressedGenes(my.cellchat)
my.cellchat <- identifyOverExpressedInteractions(my.cellchat)
# project gene expression data onto PPI network (optional)
my.cellchat <- projectData(my.cellchat, PPI.human)10.3 Inference of cell-cell communication network
CellChat infers the biologically significant cell-cell communication by assigning each interaction with a probability value and peforming a permutation test. CellChat models the probability of cell-cell communication by integrating gene expression with prior known knowledge of the interactions between signaling ligands, receptors and their cofactors using the law of mass action.
Compute the communication probability and infer cellular communication network
my.cellchat <- computeCommunProb(my.cellchat)
# Filter out the cell-cell communication if there are only few number of cells in certain cell groups
my.cellchat <- filterCommunication(my.cellchat, min.cells = 10)Extract the inferred cellular communication network as a data frame
df.net <- subsetCommunication(my.cellchat)
head(df.net)returns a data frame consisting of all the inferred cell-cell communications at the level of ligands/receptors. Set slot.name = "netP" to access the the inferred communications at the level of signaling pathways
Infer the cell-cell communication at a signaling pathway leve
NB: The inferred intercellular communication network of each ligand-receptor pair and each signaling pathway is stored in the slot ‘net’ and ‘netP’, respectively.
my.cellchat <- computeCommunProbPathway(my.cellchat)Calculate the aggregated cell-cell communication network
We can calculate the aggregated cell-cell communication network by counting the number of links or summarizing the communication probability.
my.cellchat <- aggregateNet(my.cellchat)10.4 Visualization of cell-cell communication network
We can also visualize the aggregated cell-cell communication network. For example, showing the number of interactions or the total interaction strength (weights) between any two cell groups using circle plot.
my.cellchat <- readRDS("../data/cellchat/cellchat-obj.rds")groupSize <- as.numeric(table(my.cellchat@idents))
par(mfrow = c(1,2), xpd=TRUE)
netVisual_circle(my.cellchat@net$count, vertex.weight = groupSize, weight.scale = T, label.edge= F, title.name = "Number of interactions")
netVisual_circle(my.cellchat@net$weight, vertex.weight = groupSize, weight.scale = T, label.edge= F, title.name = "Interaction weights/strength")my.cellchat@netP$pathways## [1] "FGF" "IL4" "CD40" "FLT3"
future::plan("multiprocess", workers = 6) # do parallel
netVisual_aggregate(my.cellchat, signaling = "FGF", layout = "chord")netVisual_aggregate(my.cellchat, signaling = "IL4", layout = "chord")netVisual_aggregate(my.cellchat, signaling = "CD40", layout = "chord")netVisual_aggregate(my.cellchat, signaling = "FLT3", layout = "chord")rm(mat, mat2, meta, df.net, groupSize, i, labels, data.input,CellChatDB,CellChatDB.use)
# saveRDS(my.cellchat, file = "../data/cellchat/cellchat-obj.rds")
rm(my.cellchat)11. RNA velocity
GitHub-Velocyto
Homepage-Velocyto
velocyto is a package for the analysis of expression dynamics in single cell RNA seq data.
Generating Loom files To start, we will be generating loom files (a file format designed for genomics datasets such as single-cell) for every single-cell sample you used in your Seurat analysis. A loom file is different from the file format you used in Seurat. A loom file has to be generated from the original FASTQ or BAM files for your samples(s).
Skip RNA velocity due to the absence of FASTQ or BAM files
12. Enrichment
12.1 GO/KEGG
library(clusterProfiler)
library(org.Hs.eg.db)
library(ggplot2)Comparison between cluster0 and cluster1
dge.cluster <- Seurat::FindMarkers(set1.obj,ident.1 = 0, ident.2 = 1)
sig_dge.cluster <- subset(dge.cluster, p_val_adj<0.01&abs(avg_log2FC)>1)Go analysis:
ego_ALL <- enrichGO(gene = row.names(sig_dge.cluster),
#universe = row.names(dge.celltype),
OrgDb = 'org.Hs.eg.db',
keyType = 'SYMBOL',
ont = "ALL",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
ego_all <- data.frame(ego_ALL)
write.csv(ego_all,'../output/12.Enrichment/enrichGO.csv') ego_CC <- enrichGO(gene = row.names(sig_dge.cluster),
#universe = row.names(dge.celltype),
OrgDb = 'org.Hs.eg.db',
keyType = 'SYMBOL',
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
ego_MF <- enrichGO(gene = row.names(sig_dge.cluster),
#universe = row.names(dge.celltype),
OrgDb = 'org.Hs.eg.db',
keyType = 'SYMBOL',
ont = "MF",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
ego_BP <- enrichGO(gene = row.names(sig_dge.cluster),
#universe = row.names(dge.celltype),
OrgDb = 'org.Hs.eg.db',
keyType = 'SYMBOL',
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
ego_CC@result$Description <- substring(ego_CC@result$Description,1,70)
ego_MF@result$Description <- substring(ego_MF@result$Description,1,70)
ego_BP@result$Description <- substring(ego_BP@result$Description,1,70)
p_BP <- barplot(ego_BP,showCategory = 10) + ggtitle("barplot for Biological process")
p_CC <- barplot(ego_CC,showCategory = 10) + ggtitle("barplot for Cellular component")
p_MF <- barplot(ego_MF,showCategory = 10) + ggtitle("barplot for Molecular function")
plotc <- p_BP/p_CC/p_MF
plotcggsave('../output/12.Enrichment/enrichGO.png', plotc, width = 12,height = 10)
genelist <- bitr(row.names(sig_dge.cluster), fromType="SYMBOL",
toType="ENTREZID", OrgDb='org.Hs.eg.db')
genelist <- pull(genelist,ENTREZID) ekegg <- enrichKEGG(gene = genelist, organism = 'hsa')
p1 <- barplot(ekegg, showCategory=20)
p2 <- dotplot(ekegg, showCategory=20)
plotc2 <- p1/p2
plotc2ggsave("../output/12.Enrichment/enrichKEGG.png", plot = plotc2, width = 12, height = 10)12.2 GSVA
library(msigdbr) # provide MSigdb gene set
library(dplyr)
library(GSVA)
library(pheatmap)
library(patchwork)MSigdb(Molecular Signatures Database)contains 9 different gene set:
- H:hallmark gene sets—癌症特征基因集合,共50 gene sets。
- C1:positional gene sets—染色体位置基因集合,共299 gene sets。
- C2:curated gene sets—包含了已知数据库,文献等基因集信息,包含5529 gene sets。
- C3:motif gene sets—调控靶基因集合,包括miRNA-target以及转录因子-target调控模式,3735 gene sets。
- C4:computational gene sets—计算机软件预测出来的基因集合,主要是和癌症相关的基因,858 gene sets。
- C5:GO gene sets—Gene Ontology对应的基因集合,10192 gene sets。
- C6:oncogenic signatures—致癌基因集合,大部分来源于NCBI GEO发表芯片数据,189 gene sets。
- C7:immunologic signatures—免疫相关基因集,4872 gene sets。
- C8:single cell identitiy gene sets, 302 gene sets
meta <- set1.obj@meta.data[,c("Clinical.Info","seurat_clusters","Classification")]
m_df <- msigdbr(species = "Homo sapiens", category = "C2", subcategory = "CP:KEGG")
msigdbr_list <- split(x = m_df$gene_symbol, f = m_df$gs_name)
expr <- as.matrix(set1.obj@assays$RNA@counts) If I don’t use job package, R would be shut down easily! But RMarkdown doesn’t support job package because it’s actually a RStudio plugin.
job::job(kegg.info= { kegg <- gsva(expr, msigdbr_list, kcdf="Gaussian",method = "gsva", parallel.sz = 16)
})
save(kegg.info, file = "../data/Enrichment/kegg-info.RData")es <- data.frame(t(kegg.info$kegg),stringsAsFactors=F) Add meta data
set1.obj <- AddMetaData(set1.obj, es)p1 <- FeaturePlot(set1.obj, features = "KEGG_LIMONENE_AND_PINENE_DEGRADATION", reduction = 'umap')
p2 <- FeaturePlot(set1.obj, features = "KEGG_ETHER_LIPID_METABOLISM", reduction = 'umap')
p3 <- FeaturePlot(set1.obj, features = "KEGG_RIBOSOME", reduction = 'umap')
p4 <- FeaturePlot(set1.obj, features = "KEGG_ASTHMA", reduction = 'umap')
plotc = (p1|p2)/(p3|p4)
ggsave("../output/12.Enrichment/gsva_featureplot.png", plotc, width = 12, height = 8)
plotcHeatmap of every cluster and their function
meta <- meta %>%arrange(meta$seurat_clusters)
data <- kegg.info$kegg[,rownames(meta)]
group <- factor(meta[,"seurat_clusters"],ordered = F)
data1 <-NULL
for(i in 0:(length(unique(group))-1)){
ind <-which(group==i)
dat <- apply(data[,ind], 1, mean)
data1 <-cbind(data1,dat)
}
colnames(data1) <- paste("Cluster",1:17)
result<- t(scale(t(data1)))
result[result>2]=2
result[result<-2]=-2p <- pheatmap(result[1:20,],
cluster_rows = F,
cluster_cols = T,
show_rownames = T,
show_colnames = T,
color =colorRampPalette(c("blue", "white","red"))(100),
cellwidth = 10, cellheight = 18,
fontsize = 6)pdf(("../output/12.Enrichment/gsva_celltype.pdf"),width = 8,height = 8)
print(p)
dev.off()## png
## 2
Code and output availability
References
R Markdown Cookbook
2020-Nat Protoc.-Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data
Sanger-scRNAseq Course
Broad Institute-ANALYSIS OF SINGLE CELL RNA-SEQ DATA
Seurat: Quality control
DoubletFinder:利用人造的最近邻域检测单细胞RNA测序中的doublets
scRNAseq双细胞去除-1: DoubletFinder
Use inferCNV to analyze copy number variation in single-cell transcriptome
单细胞转录组(16)10X inferCNV学习笔记
单细胞转录组高级分析四:scRNA数据推断CNV
Using Monocle 3 with Seurat 3 integrated object #1658
How to use Seurat Generated UMAP for Monocle3 pseudotime analyse? #388
Getting started with Monocle 3
CellChat:细胞间相互作用分析利器
Inference and analysis of cell-cell communication using CellChat
Interface with other single-cell analysis toolkits
Appendix
sessionInfo()## R version 4.0.4 (2021-02-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] pheatmap_1.0.12 GSVA_1.38.2
## [3] msigdbr_7.4.1 org.Hs.eg.db_3.12.0
## [5] AnnotationDbi_1.52.0 clusterProfiler_3.18.1
## [7] CellChat_1.0.0 bigmemory_4.5.36
## [9] igraph_1.2.6 harmony_1.0
## [11] Rcpp_1.0.6 forcats_0.5.1
## [13] purrr_0.3.4 readr_1.4.0
## [15] tidyr_1.1.3 tibble_3.1.2
## [17] tidyverse_1.3.1 monocle3_1.0.0
## [19] ggsci_2.9 AnnoProbe_0.1.0
## [21] infercnv_1.6.0 scater_1.18.6
## [23] SingleCellExperiment_1.12.0 celldex_1.0.0
## [25] SingleR_1.4.1 SummarizedExperiment_1.20.0
## [27] Biobase_2.50.0 GenomicRanges_1.42.0
## [29] GenomeInfoDb_1.26.7 IRanges_2.24.1
## [31] S4Vectors_0.28.1 BiocGenerics_0.36.1
## [33] MatrixGenerics_1.2.1 matrixStats_0.58.0
## [35] clustree_0.4.3 ggraph_2.0.5
## [37] ggplot2_3.3.3 sctransform_0.3.2
## [39] patchwork_1.1.1 DoubletFinder_2.0.3
## [41] stringr_1.4.0 Cairo_1.5-12.2
## [43] cowplot_1.1.1 dplyr_1.0.6
## [45] SeuratObject_4.0.1 Seurat_4.0.1
##
## loaded via a namespace (and not attached):
## [1] graphlayouts_0.7.1 pbapply_1.4-3
## [3] lattice_0.20-44 haven_2.4.1
## [5] vctrs_0.3.8 mgcv_1.8-35
## [7] blob_1.2.1 survival_3.2-11
## [9] spatstat.data_2.1-0 later_1.2.0
## [11] DBI_1.1.1 rappdirs_0.3.3
## [13] uwot_0.1.10 dqrng_0.3.0
## [15] zlibbioc_1.36.0 htmlwidgets_1.5.3
## [17] mvtnorm_1.1-1 GlobalOptions_0.1.2
## [19] future_1.21.0 leiden_0.3.7
## [21] irlba_2.3.3 tidygraph_1.2.0
## [23] KernSmooth_2.23-20 DT_0.18
## [25] promises_1.2.0.1 DelayedArray_0.16.3
## [27] limma_3.46.0 graph_1.68.0
## [29] RSpectra_0.16-0 fs_1.5.0
## [31] fastmatch_1.1-0 digest_0.6.27
## [33] png_0.1-7 rjags_4-10
## [35] bluster_1.0.0 scatterpie_0.1.6
## [37] DOSE_3.16.0 pkgconfig_2.0.3
## [39] GO.db_3.12.1 gridBase_0.4-7
## [41] DelayedMatrixStats_1.12.3 ggbeeswarm_0.6.0
## [43] iterators_1.0.13 statnet.common_4.4.1
## [45] reticulate_1.20 prettydoc_0.4.1
## [47] network_1.16.1 circlize_0.4.12
## [49] beeswarm_0.3.1 modeltools_0.2-23
## [51] GetoptLong_1.0.5 xfun_0.23
## [53] bslib_0.2.5.1 zoo_1.8-9
## [55] tidyselect_1.1.1 reshape2_1.4.4
## [57] ica_1.0-2 viridisLite_0.4.0
## [59] rlang_0.4.11 jquerylib_0.1.4
## [61] glue_1.4.2 RColorBrewer_1.1-2
## [63] registry_0.5-1 modelr_0.1.8
## [65] lambda.r_1.2.4 pkgmaker_0.32.2
## [67] ggsignif_0.6.1 labeling_0.4.2
## [69] httpuv_1.6.1 BiocNeighbors_1.8.2
## [71] TH.data_1.0-10 grr_0.9.5
## [73] DO.db_2.9 annotate_1.68.0
## [75] jsonlite_1.7.2 XVector_0.30.0
## [77] bit_4.0.4 mime_0.10
## [79] systemfonts_1.0.2 gridExtra_2.3
## [81] gplots_3.1.1 stringi_1.6.2
## [83] spatstat.sparse_2.0-0 scattermore_0.7
## [85] bitops_1.0-7 cli_2.4.0
## [87] RSQLite_2.2.7 bigmemory.sri_0.1.3
## [89] libcoin_1.0-8 data.table_1.14.0
## [91] rstudioapi_0.13 nlme_3.1-152
## [93] qvalue_2.22.0 scran_1.18.7
## [95] fastcluster_1.1.25 locfit_1.5-9.4
## [97] listenv_0.8.0 miniUI_0.1.1.1
## [99] leidenbase_0.1.3 dbplyr_2.1.1
## [101] gg.gap_1.3 speedglm_0.3-3
## [103] readxl_1.3.1 lifecycle_1.0.0
## [105] ExperimentHub_1.16.1 munsell_0.5.0
## [107] cellranger_1.1.0 ggalluvial_0.12.3
## [109] caTools_1.18.2 codetools_0.2-18
## [111] coda_0.19-4 vipor_0.4.5
## [113] lmtest_0.9-38 xtable_1.8-4
## [115] ROCR_1.0-11 formatR_1.9
## [117] BiocManager_1.30.15 abind_1.4-5
## [119] farver_2.1.0 FNN_1.1.3
## [121] parallelly_1.25.0 AnnotationHub_2.22.1
## [123] RANN_2.6.1 GEOquery_2.58.0
## [125] RcppAnnoy_0.0.18 goftest_1.2-2
## [127] futile.options_1.0.1 cluster_2.1.2
## [129] future.apply_1.7.0 Matrix_1.3-3
## [131] ellipsis_0.3.2 lubridate_1.7.10
## [133] ggridges_0.5.3 reprex_2.0.0
## [135] fgsea_1.16.0 argparse_2.0.3
## [137] spatstat.utils_2.1-0 htmltools_0.5.1.1
## [139] BiocFileCache_1.14.0 yaml_2.2.1
## [141] NMF_0.23.0 utf8_1.2.1
## [143] plotly_4.9.3 interactiveDisplayBase_1.28.0
## [145] XML_3.99-0.6 ggpubr_0.4.0
## [147] foreign_0.8-81 withr_2.4.2
## [149] scuttle_1.0.4 fitdistrplus_1.1-3
## [151] BiocParallel_1.24.1 bit64_4.0.5
## [153] rngtools_1.5 multcomp_1.4-17
## [155] foreach_1.5.1 spatstat.core_2.1-2
## [157] GOSemSim_2.16.1 rsvd_1.0.5
## [159] memoise_2.0.0 evaluate_0.14
## [161] rio_0.5.26 curl_4.3.1
## [163] fansi_0.4.2 highr_0.9
## [165] GSEABase_1.52.1 tensor_1.5
## [167] edgeR_3.32.1 checkmate_2.0.0
## [169] cachem_1.0.5 deldir_0.2-10
## [171] babelgene_21.4 rjson_0.2.20
## [173] openxlsx_4.2.3 rstatix_0.7.0
## [175] ggrepel_0.9.1 clue_0.3-59
## [177] tools_4.0.4 sass_0.4.0
## [179] sandwich_3.0-1 magrittr_2.0.1
## [181] RCurl_1.98-1.3 proxy_0.4-25
## [183] car_3.0-10 ape_5.5
## [185] xml2_1.3.2 httr_1.4.2
## [187] assertthat_0.2.1 rmarkdown_2.8
## [189] globals_0.14.0 R6_2.5.0
## [191] gtools_3.8.2 shape_1.4.6
## [193] statmod_1.4.36 coin_1.4-1
## [195] beachmat_2.6.4 BiocVersion_3.12.0
## [197] sna_2.6 BiocSingular_1.6.0
## [199] splines_4.0.4 rle_0.9.2
## [201] carData_3.0-4 colorspace_2.0-1
## [203] generics_0.1.0 pillar_1.6.1
## [205] tweenr_1.0.2 rvcheck_0.1.8
## [207] GenomeInfoDbData_1.2.4 plyr_1.8.6
## [209] gtable_0.3.0 futile.logger_1.4.3
## [211] Matrix.utils_0.9.8 rvest_1.0.0
## [213] zip_2.1.1 knitr_1.33
## [215] ComplexHeatmap_2.6.2 shadowtext_0.0.8
## [217] fastmap_1.1.0 doParallel_1.0.16
## [219] broom_0.7.6 scales_1.1.1
## [221] backports_1.2.1 enrichplot_1.10.2
## [223] hms_1.1.0 ggforce_0.3.3
## [225] Rtsne_0.15 shiny_1.6.0
## [227] polyclip_1.10-0 grid_4.0.4
## [229] lazyeval_0.2.2 crayon_1.4.1
## [231] MASS_7.3-54 downloader_0.4
## [233] sparseMatrixStats_1.2.1 viridis_0.6.1
## [235] reshape_0.8.8 svglite_2.0.0
## [237] rpart_4.1-15 compiler_4.0.4
## [239] spatstat.geom_2.1-0